1
|
Chang E, Sung S. Use of SNOMED CT in Large Language Models: Scoping Review. JMIR Med Inform 2024; 12:e62924. [PMID: 39374057 DOI: 10.2196/62924] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2024] [Revised: 07/22/2024] [Accepted: 09/15/2024] [Indexed: 10/08/2024] Open
Abstract
BACKGROUND Large language models (LLMs) have substantially advanced natural language processing (NLP) capabilities but often struggle with knowledge-driven tasks in specialized domains such as biomedicine. Integrating biomedical knowledge sources such as SNOMED CT into LLMs may enhance their performance on biomedical tasks. However, the methodologies and effectiveness of incorporating SNOMED CT into LLMs have not been systematically reviewed. OBJECTIVE This scoping review aims to examine how SNOMED CT is integrated into LLMs, focusing on (1) the types and components of LLMs being integrated with SNOMED CT, (2) which contents of SNOMED CT are being integrated, and (3) whether this integration improves LLM performance on NLP tasks. METHODS Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we searched ACM Digital Library, ACL Anthology, IEEE Xplore, PubMed, and Embase for relevant studies published from 2018 to 2023. Studies were included if they incorporated SNOMED CT into LLM pipelines for natural language understanding or generation tasks. Data on LLM types, SNOMED CT integration methods, end tasks, and performance metrics were extracted and synthesized. RESULTS The review included 37 studies. Bidirectional Encoder Representations from Transformers and its biomedical variants were the most commonly used LLMs. Three main approaches for integrating SNOMED CT were identified: (1) incorporating SNOMED CT into LLM inputs (28/37, 76%), primarily using concept descriptions to expand training corpora; (2) integrating SNOMED CT into additional fusion modules (5/37, 14%); and (3) using SNOMED CT as an external knowledge retriever during inference (5/37, 14%). The most frequent end task was medical concept normalization (15/37, 41%), followed by entity extraction or typing and classification. While most studies (17/19, 89%) reported performance improvements after SNOMED CT integration, only a small fraction (19/37, 51%) provided direct comparisons. The reported gains varied widely across different metrics and tasks, ranging from 0.87% to 131.66%. However, some studies showed either no improvement or a decline in certain performance metrics. CONCLUSIONS This review demonstrates diverse approaches for integrating SNOMED CT into LLMs, with a focus on using concept descriptions to enhance biomedical language understanding and generation. While the results suggest potential benefits of SNOMED CT integration, the lack of standardized evaluation methods and comprehensive performance reporting hinders definitive conclusions about its effectiveness. Future research should prioritize consistent reporting of performance comparisons and explore more sophisticated methods for incorporating SNOMED CT's relational structure into LLMs. In addition, the biomedical NLP community should develop standardized evaluation frameworks to better assess the impact of ontology integration on LLM performance.
Collapse
Affiliation(s)
- Eunsuk Chang
- Republic of Korea Air Force Aerospace Medical Center, Cheongju, Republic of Korea
| | - Sumi Sung
- Department of Nursing Science, Research Institute of Nursing Science, Chungbuk National University, Cheongju, Republic of Korea
| |
Collapse
|
2
|
Tark A, Estrada LV, Stone PW, Baernholdt M, Buck HG. Systematic review of conceptual and theoretical frameworks used in palliative care and end-of-life care research studies. Palliat Med 2023; 37:10-25. [PMID: 36081200 PMCID: PMC10790406 DOI: 10.1177/02692163221122268] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
BACKGROUND Frameworks are the conceptual underpinnings of the study. Both conceptual and theoretical frameworks are often used in palliative and end-of-life care studies to help with study design, guide, and conduct investigations. While an increasing number of investigators have included frameworks in their study, to date, there has not been a comprehensive review of frameworks that were utilized in palliative and end-of-life care research studies. AIM To summarize conceptual and theoretical frameworks used in palliative and end-of-life care research studies. And to synthesize which of eight domains from the National Consensus Project's Clinical Practice Guidelines for Quality Palliative Care (fourth edition) each framework belongs to. DESIGN Systematic review. DATA SOURCES Four electronic databases (EMBASE, the Cumulative Index to Nursing and Allied Health, PsychINFO, and PubMed) were searched from July 2010 to September 2021. RESULTS A total 2231 citations were retrieved, of which 44 articles met eligibility. Across primary studies, 33,801 study participants were captured. Twenty-six investigators (59.1%) proposed previously unpublished frameworks. In 10 studies, investigators modified existing frameworks, mainly to overcome inherent limitations. In eight studies, investigators utilized existing frameworks referenced in previously published studies. There were eight orientations identified among 44 frameworks we reviewed (e.g. system, patient, patient-doctor). CONCLUSIONS We examined palliative and end-of-life research studies to identify and characterize conceptual or theoretical frameworks proposed or utilized. Of 44 frameworks we reviewed, 21 studies (47.7%) were aligned with a Clinical Practice Guideline's single domain, while the rest two or more of eight guidelines in quality palliative care domains.
Collapse
|
3
|
Alzubi R, Alzoubi H, Katsigiannis S, West D, Ramzan N. Automated Detection of Substance-Use Status and Related Information from Clinical Text. SENSORS (BASEL, SWITZERLAND) 2022; 22:9609. [PMID: 36559979 PMCID: PMC9783118 DOI: 10.3390/s22249609] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Revised: 11/21/2022] [Accepted: 11/25/2022] [Indexed: 06/17/2023]
Abstract
This study aims to develop and evaluate an automated system for extracting information related to patient substance use (smoking, alcohol, and drugs) from unstructured clinical text (medical discharge records). The authors propose a four-stage system for the extraction of the substance-use status and related attributes (type, frequency, amount, quit-time, and period). The first stage uses a keyword search technique to detect sentences related to substance use and to exclude unrelated records. In the second stage, an extension of the NegEx negation detection algorithm is developed and employed for detecting the negated records. The third stage involves identifying the temporal status of the substance use by applying windowing and chunking methodologies. Finally, in the fourth stage, regular expressions, syntactic patterns, and keyword search techniques are used in order to extract the substance-use attributes. The proposed system achieves an F1-score of up to 0.99 for identifying substance-use-related records, 0.98 for detecting the negation status, and 0.94 for identifying temporal status. Moreover, F1-scores of up to 0.98, 0.98, 1.00, 0.92, and 0.98 are achieved for the extraction of the amount, frequency, type, quit-time, and period attributes, respectively. Natural Language Processing (NLP) and rule-based techniques are employed efficiently for extracting substance-use status and attributes, with the proposed system being able to detect substance-use status and attributes over both sentence-level and document-level data. Results show that the proposed system outperforms the compared state-of-the-art substance-use identification system on an unseen dataset, demonstrating its generalisability.
Collapse
Affiliation(s)
- Raid Alzubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Hadeel Alzoubi
- Department of Computer Science, College of Computer Science and Information Technology, King Faisal University, Al-Ahsa 31982, Saudi Arabia
| | - Stamos Katsigiannis
- Department of Computer Science, Durham University, Upper Mountjoy Campus, Stockton Road, Durham DH1 3LE, UK
| | - Daune West
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High St., Paisley PA1 2BE, UK
| | - Naeem Ramzan
- School of Computing, Engineering and Physical Sciences, University of the West of Scotland, High St., Paisley PA1 2BE, UK
| |
Collapse
|
4
|
Zhao G, Gu W, Cai W, Zhao Z, Zhang X, Liu J. MLEE: A method for extracting object-level medical knowledge graph entities from Chinese clinical records. Front Genet 2022; 13:900242. [PMID: 35938002 PMCID: PMC9354090 DOI: 10.3389/fgene.2022.900242] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Accepted: 06/16/2022] [Indexed: 11/13/2022] Open
Abstract
As a typical knowledge-intensive industry, the medical field uses knowledge graph technology to construct causal inference calculations, such as “symptom-disease”, “laboratory examination/imaging examination-disease”, and “disease-treatment method”. The continuous expansion of large electronic clinical records provides an opportunity to learn medical knowledge by machine learning. In this process, how to extract entities with a medical logic structure and how to make entity extraction more consistent with the logic of the text content in electronic clinical records are two issues that have become key in building a high-quality, medical knowledge graph. In this work, we describe a method for extracting medical entities using real Chinese clinical electronic clinical records. We define a computational architecture named MLEE to extract object-level entities with “object-attribute” dependencies. We conducted experiments based on randomly selected electronic clinical records of 1,000 patients from Shengjing Hospital of China Medical University to verify the effectiveness of the method.
Collapse
Affiliation(s)
- Genghong Zhao
- School of Computer Science and Engineering Northeastern University, Shenyang, China
- Neusoft Research of Intelligent Healthcare Technology, Shenyang, China
- *Correspondence: Genghong Zhao, ; Xia Zhang, ; Jiren Liu,
| | - Wenjian Gu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Wei Cai
- Neusoft Research of Intelligent Healthcare Technology, Shenyang, China
| | - Zhiying Zhao
- Department of Clinical Epidemiology, Shengjing Hospital of China Medical University, Shenyang, China
| | - Xia Zhang
- School of Computer Science and Engineering Northeastern University, Shenyang, China
- Neusoft Research of Intelligent Healthcare Technology, Shenyang, China
- *Correspondence: Genghong Zhao, ; Xia Zhang, ; Jiren Liu,
| | - Jiren Liu
- School of Computer Science and Engineering Northeastern University, Shenyang, China
- Neusoft Corporation, Shenyang, China
- *Correspondence: Genghong Zhao, ; Xia Zhang, ; Jiren Liu,
| |
Collapse
|
5
|
Predicted cardiovascular disease risk and prescribing of antihypertensive therapy among patients with hypertension in Australia using MedicineInsight. J Hum Hypertens 2022; 37:370-378. [PMID: 35501358 PMCID: PMC10156591 DOI: 10.1038/s41371-022-00691-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/30/2022] [Accepted: 04/07/2022] [Indexed: 11/09/2022]
Abstract
Hypertension guidelines recommend that absolute cardiovascular disease (CVD) risk guide the management of hypertensive patients. This study aimed to assess the proportion of patients with diagnosed hypertension with sufficient data to calculate absolute CVD risk and determine whether CVD risk is associated with prescribing of antihypertensive therapies. This was a cross-sectional study using a large national database of electronic medical records of patients attending general practice in 2018 (MedicineInsight). Of 571,492 patients aged 45-74 years without a history of CVD, 251,733 [40.6% (95% CI: 39.8-41.2)] had a recorded hypertension diagnosis. The proportion of patients with sufficient recorded data available to calculate CVD risk was higher for patients diagnosed with hypertension [51.0% (95% CI: 48.0-53.9)] than for patients without a diagnosis of hypertension [38.7% (95% CI: 36.5-41.0)]. Of those patients with sufficient data to calculate CVD risk, 29.3% (95% CI: 28.1-30.6) were at high risk clinically, 6.0% (95% CI: 5.8-6.3) were at high risk based on their CVD risk score, 12.8% (95% CI: 12.5-13.2) at moderate risk and 51.8% (95% CI: 50.8-52.9) at low risk. The overall prevalence of antihypertensive therapy was 60.9% (95% CI: 59.3-62.5). Prescribing was slightly lower in patients at high risk based on their CVD risk score [57.4% (95% CI: 55.4-59.4)] compared with those at low [63.3% (95% CI: 61.9-64.8)] or moderate risk [61.8% (95% CI: 60.2-63.4)] or at high risk clinically [64.1% (95% CI: 61.9-66.3)]. Guideline adherence is suboptimal, and many patients miss out on treatments that may prevent future CVD events.
Collapse
|
6
|
El-Hasnony IM, Elzeki OM, Alshehri A, Salem H. Multi-Label Active Learning-Based Machine Learning Model for Heart Disease Prediction. SENSORS 2022; 22:s22031184. [PMID: 35161928 PMCID: PMC8839067 DOI: 10.3390/s22031184] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/07/2022] [Revised: 01/28/2022] [Accepted: 01/31/2022] [Indexed: 12/02/2022]
Abstract
The rapid growth and adaptation of medical information to identify significant health trends and help with timely preventive care have been recent hallmarks of the modern healthcare data system. Heart disease is the deadliest condition in the developed world. Cardiovascular disease and its complications, including dementia, can be averted with early detection. Further research in this area is needed to prevent strokes and heart attacks. An optimal machine learning model can help achieve this goal with a wealth of healthcare data on heart disease. Heart disease can be predicted and diagnosed using machine-learning-based systems. Active learning (AL) methods improve classification quality by incorporating user–expert feedback with sparsely labelled data. In this paper, five (MMC, Random, Adaptive, QUIRE, and AUDI) selection strategies for multi-label active learning were applied and used for reducing labelling costs by iteratively selecting the most relevant data to query their labels. The selection methods with a label ranking classifier have hyperparameters optimized by a grid search to implement predictive modelling in each scenario for the heart disease dataset. Experimental evaluation includes accuracy and F-score with/without hyperparameter optimization. Results show that the generalization of the learning model beyond the existing data for the optimized label ranking model uses the selection method versus others due to accuracy. However, the selection method was highlighted in regards to the F-score using optimized settings.
Collapse
Affiliation(s)
- Ibrahim M. El-Hasnony
- Faculty of Computers and Information Sciences, Mansoura University, Mansoura 35516, Egypt;
| | - Omar M. Elzeki
- Faculty of Computers and Information Sciences, Mansoura University, Mansoura 35516, Egypt;
- Faculty of Computer Science, New Mansoura University, Gamasa 35712, Egypt
- Correspondence:
| | - Ali Alshehri
- Department of Computer Science, University of Tabuk, Tabuk 71491, Saudi Arabia;
| | - Hanaa Salem
- Faculty of Engineering, Delta University for Science and Technology, Gamasa 35712, Egypt;
| |
Collapse
|
7
|
Abe T, Sato H, Nakamura K. Extracting Safety-II Factors From an Incident Reporting System by Text Analysis. Cureus 2022; 14:e21528. [PMID: 35223303 PMCID: PMC8863551 DOI: 10.7759/cureus.21528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/23/2022] [Indexed: 11/05/2022] Open
Abstract
Introduction The use of electric health records (EHRs) has spread worldwide and has helped record huge amounts of data. However, despite accumulated data from EHRs, especially text data, the information has been underutilized. Our research questions and aims are as follows: How can an incident report system extract common themes behind incidents, good practices, improved quality, and safety based on the Safety-II/resilient healthcare approach? Methods We extracted data from the electronic incident reporting system of the Yokohama City University Medical Center between April 1, 2016 and March 31, 2018. We utilized natural language processing and text mining to extract concept categories and word patterns. We also used the incident levels as outcomes, as well as classification and regression tree analysis to obtain associated text combinations. Results A total of 17,231 cases were reported through the electronic incident reporting system in our hospital during the study period. Hospital staff has to be prepared for incidents with complex mechanisms in daily practice. The hospital staff tend to focus on individual actions rather than considering a systematic approach. Conclusion Certain combinations of professions and contents may contribute to resilient management. Studies on Safety-II management utilizing clinical information and text records are needed.
Collapse
|
8
|
Su D, Li Q, Zhang T, Veliz P, Chen Y, He K, Mahajan P, Zhang X. Prediction of acute appendicitis among patients with undifferentiated abdominal pain at emergency department. BMC Med Res Methodol 2022; 22:18. [PMID: 35026994 PMCID: PMC8759254 DOI: 10.1186/s12874-021-01490-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 12/08/2021] [Indexed: 11/12/2022] Open
Abstract
Background Early screening and accurately identifying Acute Appendicitis (AA) among patients with undifferentiated symptoms associated with appendicitis during their emergency visit will improve patient safety and health care quality. The aim of the study was to compare models that predict AA among patients with undifferentiated symptoms at emergency visits using both structured data and free-text data from a national survey. Methods We performed a secondary data analysis on the 2005-2017 United States National Hospital Ambulatory Medical Care Survey (NHAMCS) data to estimate the association between emergency department (ED) patients with the diagnosis of AA, and the demographic and clinical factors present at ED visits during a patient’s ED stay. We used binary logistic regression (LR) and random forest (RF) models incorporating natural language processing (NLP) to predict AA diagnosis among patients with undifferentiated symptoms. Results Among the 40,441 ED patients with assigned International Classification of Diseases (ICD) codes of AA and appendicitis-related symptoms between 2005 and 2017, 655 adults (2.3%) and 256 children (2.2%) had AA. For the LR model identifying AA diagnosis among adult ED patients, the c-statistic was 0.72 (95% CI: 0.69–0.75) for structured variables only, 0.72 (95% CI: 0.69–0.75) for unstructured variables only, and 0.78 (95% CI: 0.76–0.80) when including both structured and unstructured variables. For the LR model identifying AA diagnosis among pediatric ED patients, the c-statistic was 0.84 (95% CI: 0.79–0.89) for including structured variables only, 0.78 (95% CI: 0.72–0.84) for unstructured variables, and 0.87 (95% CI: 0.83–0.91) when including both structured and unstructured variables. The RF method showed similar c-statistic to the corresponding LR model. Conclusions We developed predictive models that can predict the AA diagnosis for adult and pediatric ED patients, and the predictive accuracy was improved with the inclusion of NLP elements and approaches. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01490-9.
Collapse
Affiliation(s)
- Dai Su
- Department of Health Management and Policy, School of Public Health, Capital Medical University, Beijing, China
| | - Qinmengge Li
- Department of Systems, Populations, and Leadership, University of Michigan School of Nursing, Ann Arbor, USA.,Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, USA
| | - Tao Zhang
- Department of Epidemiology and Biostatistics, West China School of Public Health School, Sichuan University, Chengdu, China
| | - Philip Veliz
- Department of Systems, Populations, and Leadership, University of Michigan School of Nursing, Ann Arbor, USA
| | - Yingchun Chen
- Department of Health Management, School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.,Research Center for Rural Health Services, Hubei Province Key Research Institute of Humanities and Social Sciences, Wuhan, China
| | - Kevin He
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, USA
| | - Prashant Mahajan
- Department of Emergency Medicine, University of Michigan School of Medicine, Ann Arbor, USA
| | - Xingyu Zhang
- Thomas E. Starzl Transplantation Institute, University of Pittsburgh Medical Center, Pittsburgh, USA.
| |
Collapse
|
9
|
Different Data Mining Approaches Based Medical Text Data. JOURNAL OF HEALTHCARE ENGINEERING 2021; 2021:1285167. [PMID: 34912530 PMCID: PMC8668297 DOI: 10.1155/2021/1285167] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 11/18/2021] [Indexed: 12/15/2022]
Abstract
The amount of medical text data is increasing dramatically. Medical text data record the progress of medicine and imply a large amount of medical knowledge. As a natural language, they are characterized by semistructured, high-dimensional, high data volume semantics and cannot participate in arithmetic operations. Therefore, how to extract useful knowledge or information from the total available data is very important task. Using various techniques of data mining can extract valuable knowledge or information from data. In the current study, we reviewed different approaches to apply for medical text data mining. The advantages and shortcomings for each technique compared to different processes of medical text data were analyzed. We also explored the applications of algorithms for providing insights to the users and enabling them to use the resources for the specific challenges in medical text data. Further, the main challenges in medical text data mining were discussed. Findings of this paper are benefit for helping the researchers to choose the reasonable techniques for mining medical text data and presenting the main challenges to them in medical text data mining.
Collapse
|
10
|
Adhikari M, Munusamy A. iCovidCare: Intelligent health monitoring framework for COVID-19 using ensemble random forest in edge networks. INTERNET OF THINGS (AMSTERDAM, NETHERLANDS) 2021; 14:100385. [PMID: 38620813 PMCID: PMC7943395 DOI: 10.1016/j.iot.2021.100385] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 02/01/2021] [Accepted: 02/24/2021] [Indexed: 06/18/2023]
Abstract
The COVID-19 outbreak is in its growing stage due to the lack of standard diagnosis for the patients. In recent times, various models with machine learning have been developed to predict and diagnose novel coronavirus. However, the existing models fail to take an instant decision for detecting the COVID-19 patient immediately and cannot handle multiple medical sensor data for disease prediction. To handle such challenges, we propose an intelligent health monitoring and prediction framework, namely the iCovidCare model for predicting the health status of COVID-19 patients using the ensemble Random Forest (eRF) technique in edge networks. In the proposed framework, a rule-based policy is designed on the local edge devices to detect the risk factor of a patient immediately using monitoring Temperature sensor values. The real-time health monitoring parameters of different medical sensors are transmitted to the centralized cloud servers for future health prediction of the patients. The standard eRF technique is used to predict the health status of the patients using the proposed data fusion and feature selection strategy by selecting the most significant features for disease prediction. The proposed iCovidCare model is evaluated with a synthetic COVID-19 dataset and compared with the standard classification models based on various performance matrices to show its effectiveness. The proposed model has achieved 95.13% accuracy, which is higher than the standard classification models.
Collapse
Affiliation(s)
- Mainak Adhikari
- Mobile & Cloud Lab, Institute of Computer Science, University of Tartu, Estonia
| | | |
Collapse
|
11
|
Large-scale identification of aortic stenosis and its severity using natural language processing on electronic health records. CARDIOVASCULAR DIGITAL HEALTH JOURNAL 2021; 2:156-163. [PMID: 35265904 PMCID: PMC8890044 DOI: 10.1016/j.cvdhj.2021.03.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
Background Objective Methods Results Conclusion
Collapse
|
12
|
Nandy S, Adhikari M, Balasubramanian V, Menon VG, Li X, Zakarya M. An intelligent heart disease prediction system based on swarm-artificial neural network. Neural Comput Appl 2021. [DOI: 10.1007/s00521-021-06124-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
13
|
Yang Z, Xu W, Chen R. A deep learning-based multi-turn conversation modeling for diagnostic Q&A document recommendation. Inf Process Manag 2021. [DOI: 10.1016/j.ipm.2020.102485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
14
|
Turchin A, Florez Builes LF. Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review. J Diabetes Sci Technol 2021; 15:553-560. [PMID: 33736486 PMCID: PMC8120048 DOI: 10.1177/19322968211000831] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
BACKGROUND Real-world evidence research plays an increasingly important role in diabetes care. However, a large fraction of real-world data are "locked" in narrative format. Natural language processing (NLP) technology offers a solution for analysis of narrative electronic data. METHODS We conducted a systematic review of studies of NLP technology focused on diabetes. Articles published prior to June 2020 were included. RESULTS We included 38 studies in the analysis. The majority (24; 63.2%) described only development of NLP tools; the remainder used NLP tools to conduct clinical research. A large fraction (17; 44.7%) of studies focused on identification of patients with diabetes; the rest covered a broad range of subjects that included hypoglycemia, lifestyle counseling, diabetic kidney disease, insulin therapy and others. The mean F1 score for all studies where it was available was 0.882. It tended to be lower (0.817) in studies of more linguistically complex concepts. Seven studies reported findings with potential implications for improving delivery of diabetes care. CONCLUSION Research in NLP technology to study diabetes is growing quickly, although challenges (e.g. in analysis of more linguistically complex concepts) remain. Its potential to deliver evidence on treatment and improving quality of diabetes care is demonstrated by a number of studies. Further growth in this area would be aided by deeper collaboration between developers and end-users of natural language processing tools as well as by broader sharing of the tools themselves and related resources.
Collapse
Affiliation(s)
- Alexander Turchin
- Brigham and Women’s Hospital, Boston,
MA, USA
- Alexander Turchin, MD, MS, Brigham and
Women’s Hospital, 221 Longwood Avenue, Boston, MA 02115, USA.
| | | |
Collapse
|
15
|
Brunekreef TE, Otten HG, van den Bosch SC, Hoefer IE, van Laar JM, Limper M, Haitjema S. Text Mining of Electronic Health Records Can Accurately Identify and Characterize Patients With Systemic Lupus Erythematosus. ACR Open Rheumatol 2021; 3:65-71. [PMID: 33434395 PMCID: PMC7882527 DOI: 10.1002/acr2.11211] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 11/16/2020] [Indexed: 12/20/2022] Open
Abstract
Objective Electronic health records (EHR) are increasingly being recognized as a major source of data reusable for medical research and quality monitoring, although patient identification and assessment of symptoms (characterization) remain challenging, especially in complex diseases such as systemic lupus erythematosus (SLE). Current coding systems are unable to assess information recorded in the physician’s free‐text notes. This study shows that text mining can be used as a reliable alternative. Methods In a multidisciplinary research team of data scientists and medical experts, a text mining algorithm on 4607 patient records was developed to assess the diagnosis of 14 different immune‐mediated inflammatory diseases and the presence of 18 different symptoms in the EHR. The text mining algorithm included key words in the EHR, while mining the context for exclusion phrases. The accuracy of the text mining algorithm was assessed by manually checking the EHR of 100 random patients suspected of having SLE for diagnoses and symptoms and comparing the outcome with the outcome of the text mining algorithm. Results After evaluation of 100 patient records, the text mining algorithm had a sensitivity of 96.4% and a specificity of 93.3% in assessing the presence of SLE. The algorithm detected potentially life‐threatening symptoms (nephritis, pleuritis) with good sensitivity (80%‐82%) and high specificity (97%‐97%). Conclusion We present a text mining algorithm that can accurately identify and characterize patients with SLE using routinely collected data from the EHR. Our study shows that using text mining, data from the EHR can be reused in research and quality control.
Collapse
Affiliation(s)
- Tammo E Brunekreef
- University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Henny G Otten
- University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | | | - Imo E Hoefer
- University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Jacob M van Laar
- University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Maarten Limper
- University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| | - Saskia Haitjema
- University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands
| |
Collapse
|
16
|
Cheerkoot-Jalim S, Khedo KK. A systematic review of text mining approaches applied to various application areas in the biomedical domain. JOURNAL OF KNOWLEDGE MANAGEMENT 2020. [DOI: 10.1108/jkm-09-2019-0524] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Purpose
This work shows the results of a systematic literature review on biomedical text mining. The purpose of this study is to identify the different text mining approaches used in different application areas of the biomedical domain, the common tools used and the challenges of biomedical text mining as compared to generic text mining algorithms. This study will be of value to biomedical researchers by allowing them to correlate text mining approaches to specific biomedical application areas. Implications for future research are also discussed.
Design/methodology/approach
The review was conducted following the principles of the Kitchenham method. A number of research questions were first formulated, followed by the definition of the search strategy. The papers were then selected based on a list of assessment criteria. Each of the papers were analyzed and information relevant to the research questions were extracted.
Findings
It was found that researchers have mostly harnessed data sources such as electronic health records, biomedical literature, social media and health-related forums. The most common text mining technique was natural language processing using tools such as MetaMap and Unstructured Information Management Architecture, alongside the use of medical terminologies such as Unified Medical Language System. The main application area was the detection of adverse drug events. Challenges identified included the need to deal with huge amounts of text, the heterogeneity of the different data sources, the duality of meaning of words in biomedical text and the amount of noise introduced mainly from social media and health-related forums.
Originality/value
To the best of the authors’ knowledge, other reviews in this area have focused on either specific techniques, specific application areas or specific data sources. The results of this review will help researchers to correlate most relevant and recent advances in text mining approaches to specific biomedical application areas by providing an up-to-date and holistic view of work done in this research area. The use of emerging text mining techniques has great potential to spur the development of innovative applications, thus considerably impacting on the advancement of biomedical research.
Collapse
|
17
|
Fu S, Chen D, He H, Liu S, Moon S, Peterson KJ, Shen F, Wang L, Wang Y, Wen A, Zhao Y, Sohn S, Liu H. Clinical concept extraction: A methodology review. J Biomed Inform 2020; 109:103526. [PMID: 32768446 PMCID: PMC7746475 DOI: 10.1016/j.jbi.2020.103526] [Citation(s) in RCA: 60] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 07/30/2020] [Accepted: 08/02/2020] [Indexed: 01/11/2023]
Abstract
BACKGROUND Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| | - David Chen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Huan He
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sungrim Moon
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Kevin J Peterson
- Department of Information Technology, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Yiqing Zhao
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| |
Collapse
|
18
|
Huang HL, Hong SH, Tsai YC. Approaches to text mining for analyzing treatment plan of quit smoking with free-text medical records: A PRISMA-compliant meta-analysis. Medicine (Baltimore) 2020; 99:e20999. [PMID: 32702841 PMCID: PMC7373589 DOI: 10.1097/md.0000000000020999] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Smoking is a complex behavior associated with multiple factors such as personality, environment, genetics, and emotions. Text data are a rich source of information. However, pure text data requires substantial human resources and time to extract and apply the knowledge, resulting in many details not being discovered and used. This study proposes a novel approach that explores a text mining flow to capture the behavior of smokers quitting tobacco from their free-text medical records. More importantly, the paper examines the impact of these changes on smokers. The goal is to help smokers quit smoking. The study population included adult patients that were >20 years old of age who consulted the medical center's smoking cessation outpatient clinic from January to December 2016. A total of 246 patients visited the clinic in the study period. After excluding incomplete medical records or lost follow up, there were 141 patients included in the final analysis. There are 141 valid data points for patients who only treated once and patients with empty medical records. Two independent review authors will make the study selection based on the study eligibility criteria. Our participants are from all the patients that were involved in this study and the staff of Division of Family Medicine, National Taiwan University Hospital. Interventions and study appraisal are not required. METHODS The paper develops an algorithm for analyzing smoking cessation treatment plans documented in free-text medical records. The approach involves the development of an information extraction flow that uses a combination of data mining techniques, including text mining. It can use not only to help others quit smoking but also for other medical records with similar data elements. The Apriori associations of our algorithm from the text mining revealed several important clinical implications for physicians during smoking cessation. For example, an apparent association between nicotine replacement therapy (NRT) and other medications such as Inderal, Rivotril, Dogmatyl, and Solaxin. Inderal and Rivotril use in patients with anxiety disorders as anxiolytics frequently. RESULTS Finally, we find that the rules associating with NRT combination with blood tests may imply that the use of NRT combination therapy in smokers with chronic illness may result in lower abstinence. Further large-scale surveys comparing varenicline or bupropion with NRT combination in smokers with a chronic disease are warranted. The Apriori algorithm suffers from some weaknesses despite being transparent and straightforward. The main limitation is the costly wasting of time to hold a vast number of candidates sets with frequent itemsets, low minimum support, or large itemsets. CONCLUSION In the paper, the most visible areas for the therapeutic application of text mining are the integration and transfer of advances made in basic sciences, as well as a better understanding of the processes involved in smoking cessation. Text mining may also be useful for supporting decision-making processes associated with smoking cessation. Systematic review registration number is not registered.
Collapse
Affiliation(s)
- Hsien-Liang Huang
- Division of Family Medicine, National Taiwan University Hospital, Zhongzheng Dist
| | - Shi-Hao Hong
- Computer Science and Technology, HeFei University of Technology, Hefei, Anhui Province
| | - Yun-Cheng Tsai
- School of Big Data Management, Soochow University, Shihlin District, Taipei City, Taiwan (R.O.C.)
| |
Collapse
|
19
|
Usama M, Ahmad B, Xiao W, Hossain MS, Muhammad G. Self-attention based recurrent convolutional neural network for disease prediction using healthcare data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2020; 190:105191. [PMID: 31753591 DOI: 10.1016/j.cmpb.2019.105191] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Revised: 10/29/2019] [Accepted: 11/05/2019] [Indexed: 06/10/2023]
Abstract
BACKGROUND AND OBJECTIVE Nowadays computer-aided disease diagnosis from medical data through deep learning methods has become a wide area of research. Existing works of analyzing clinical text data in the medical domain, which substantiate useful information related to patients with disease in large quantity, benefits early-stage disease diagnosis. However, benefits of analysis not achieved well when the traditional rule-based and classical machine learning methods used; which are unable to handle the unstructured clinical text and only a single method is not able to handle all challenges related to the analysis of the unstructured text, Moreover, the contribution of all words in clinical text is not the same in the prediction of disease. Therefore, there is a need to develop a neural model which solve the above clinical application problems, is an interesting topic which needs to be explored. METHODS Thus considering the above problems, first, this paper present self-attention based recurrent convolutional neural network (RCNN) model using real-life clinical text data collected from a hospital in Wuhan, China. This model automatically learns high-level semantic features from clinical text by using bi-direction recurrent connection within convolution. Second, to deal with other clinical text challenges, we combine the ability of RCNN with the self-attention mechanism. Thus, self-attention gets the focus of the model on essential convolve features which have effective meaning in the clinical text by calculating the probability of each convolve feature through softmax. RESULTS The proposed model is evaluated on real-life hospital dataset and used measurement metrics as Accuracy and recall. Experiment results exhibit that the proposed model reaches up to accuracy 95.71%, which is better than many existing methods for cerebral infarction disease. CONCLUSIONS This article presented the self-attention based RCNN model by combining the RCNN with self-attention mechanism for prediction of cerebral infarction disease. The obtained results show that the presented model better predict the cerebral infarction disease risk compared to many existing methods. The same model can also be used for the prediction of other disease risks.
Collapse
Affiliation(s)
- Mohd Usama
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Belal Ahmad
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - Wenjing Xiao
- School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China.
| | - M Shamim Hossain
- Department of Software Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia.
| | - Ghulam Muhammad
- Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia.
| |
Collapse
|
20
|
Bagheri A, Sammani A, van der Heijden PGM, Asselbergs FW, Oberski DL. ETM: Enrichment by topic modeling for automated clinical sentence classification to detect patients’ disease history. J Intell Inf Syst 2020. [DOI: 10.1007/s10844-020-00605-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
AbstractGiven the rapid rate at which text data are being digitally gathered in the medical domain, there is growing need for automated tools that can analyze clinical notes and classify their sentences in electronic health records (EHRs). This study uses EHR texts to detect patients’ disease history from clinical sentences. However, in EHRs, sentences are less topic-focused and shorter than that in general domain, which leads to the sparsity of co-occurrence patterns and the lack of semantic features. To tackle this challenge, current approaches for clinical sentence classification are dependent on external information to improve classification performance. However, this is implausible owing to a lack of universal medical dictionaries. This study proposes the ETM (enrichment by topic modeling) algorithm, based on latent Dirichlet allocation, to smoothen the semantic representations of short sentences. The ETM enriches text representation by incorporating probability distributions generated by an unsupervised algorithm into it. It considers the length of the original texts to enhance representation by using an internal knowledge acquisition procedure. When it comes to clinical predictive modeling, interpretability improves the acceptance of the model. Thus, for clinical sentence classification, the ETM approach employs an initial TFiDF (term frequency inverse document frequency) representation, where we use the support vector machine and neural network algorithms for the classification task. We conducted three sets of experiments on a data set consisting of clinical cardiovascular notes from the Netherlands to test the sentence classification performance of the proposed method in comparison with prevalent approaches. The results show that the proposed ETM approach outperformed state-of-the-art baselines.
Collapse
|
21
|
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019; 7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 226] [Impact Index Per Article: 45.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. OBJECTIVE The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. METHODS Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles. RESULTS Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. CONCLUSIONS Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
Collapse
Affiliation(s)
- Seyedmostafa Sheikhalishahi
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
- Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
| | - Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Alberto Lavelli
- NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Venet Osmani
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| |
Collapse
|
22
|
Abstract
Medical data is one of the most rewarding and yet most complicated data to analyze. How can healthcare providers use modern data analytics tools and technologies to analyze and create value from complex data? Data analytics, with its promise to efficiently discover valuable pattern by analyzing large amount of unstructured, heterogeneous, non-standard and incomplete healthcare data. It does not only forecast but also helps in decision making and is increasingly noticed as breakthrough in ongoing advancement with the goal is to improve the quality of patient care and reduces the healthcare cost. The aim of this study is to provide a comprehensive and structured overview of extensive research on the advancement of data analytics methods for disease prevention. This review first introduces disease prevention and its challenges followed by traditional prevention methodologies. We summarize state-of-the-art data analytics algorithms used for classification of disease, clustering (unusually high incidence of a particular disease), anomalies detection (detection of disease) and association as well as their respective advantages, drawbacks and guidelines for selection of specific model followed by discussion on recent development and successful application of disease prevention methods. The article concludes with open research challenges and recommendations.
Collapse
|
23
|
Helgheim BI, Maia R, Ferreira JC, Martins AL. Merging Data Diversity of Clinical Medical Records to Improve Effectiveness. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2019; 16:ijerph16050769. [PMID: 30832447 PMCID: PMC6427263 DOI: 10.3390/ijerph16050769] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/30/2018] [Revised: 02/04/2019] [Accepted: 02/24/2019] [Indexed: 12/13/2022]
Abstract
Medicine is a knowledge area continuously experiencing changes. Every day, discoveries and procedures are tested with the goal of providing improved service and quality of life to patients. With the evolution of computer science, multiple areas experienced an increase in productivity with the implementation of new technical solutions. Medicine is no exception. Providing healthcare services in the future will involve the storage and manipulation of large volumes of data (big data) from medical records, requiring the integration of different data sources, for a multitude of purposes, such as prediction, prevention, personalization, participation, and becoming digital. Data integration and data sharing will be essential to achieve these goals. Our work focuses on the development of a framework process for the integration of data from different sources to increase its usability potential. We integrated data from an internal hospital database, external data, and also structured data resulting from natural language processing (NPL) applied to electronic medical records. An extract-transform and load (ETL) process was used to merge different data sources into a single one, allowing more effective use of these data and, eventually, contributing to more efficient use of the available resources.
Collapse
Affiliation(s)
- Berit I Helgheim
- Logistics, Molde University College, Molde, NO-6410 Molde, Norway.
| | - Rui Maia
- DEI, Instituto Superior Técnico, Lisboa, 1049-001 Portugal.
| | - Joao C Ferreira
- Instituto Universitário de Lisboa (ISCTE-IUL), ISTAR-IUL, Lisbon 1649-026, Portugal.
| | - Ana Lucia Martins
- Instituto Universitário de Lisboa (ISCTE-IUL), BRU-IUL, Lisbon 1649-026, Portugal.
| |
Collapse
|
24
|
Utilizing electronic health records to predict multi-type major adverse cardiovascular events after acute coronary syndrome. Knowl Inf Syst 2018. [DOI: 10.1007/s10115-018-1270-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
25
|
Hinton W, Liyanage H, McGovern A, Liaw ST, Kuziemsky C, Munro N, de Lusignan S. Measuring Quality of Healthcare Outcomes in Type 2 Diabetes from Routine Data: a Seven-nation Survey Conducted by the IMIA Primary Health Care Working Group. Yearb Med Inform 2017; 26:201-208. [PMID: 28480471 PMCID: PMC6250989 DOI: 10.15265/iy-2017-005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background: The Institute of Medicine framework defines six dimensions of quality for healthcare systems: (1) safety, (2) effectiveness, (3) patient centeredness, (4) timeliness of care, (5) efficiency, and (6) equity. Large health datasets provide an opportunity to assess quality in these areas. Objective: To perform an international comparison of the measurability of the delivery of these aims, in people with type 2 diabetes mellitus (T2DM) from large datasets. Method: We conducted a survey to assess healthcare outcomes data quality of existing databases and disseminated this through professional networks. We examined the data sources used to collect the data, frequency of data uploads, and data types used for identifying people with T2DM. We compared data completeness across the six areas of healthcare quality, using selected measures pertinent to T2DM management. Results: We received 14 responses from seven countries (Australia, Canada, Italy, the Netherlands, Norway, Portugal, Turkey and the UK). Most databases reported frequent data uploads and would be capable of near real time analysis of healthcare quality.The majority of recorded data related to safety (particularly medication adverse events) and treatment efficacy (glycaemic control and microvascular disease). Data potentially measuring equity was less well recorded. Recording levels were lowest for patient-centred care, timeliness of care, and system efficiency, with the majority of databases containing no data in these areas. Databases using primary care sources had higher data quality across all areas measured. Conclusion: Data quality could be improved particularly in the areas of patient-centred care, timeliness, and efficiency. Primary care derived datasets may be most suited to healthcare quality assessment.
Collapse
Affiliation(s)
- W. Hinton
- Clinical Informatics & Health Outcomes Research Group, Department of Clinical & Experimental Medicine, University of Surrey, Guildford, Surrey, UK
| | - H. Liyanage
- Clinical Informatics & Health Outcomes Research Group, Department of Clinical & Experimental Medicine, University of Surrey, Guildford, Surrey, UK
| | - A. McGovern
- Clinical Informatics & Health Outcomes Research Group, Department of Clinical & Experimental Medicine, University of Surrey, Guildford, Surrey, UK
| | - S.-T. Liaw
- School of Public Health & Community Medicine, UNSW Medicine, Australia
| | - C. Kuziemsky
- Telfer School of Management, University of Ottawa, Ottawa, Ontario, Canada
| | - N. Munro
- Clinical Informatics & Health Outcomes Research Group, Department of Clinical & Experimental Medicine, University of Surrey, Guildford, Surrey, UK
| | - S. de Lusignan
- Clinical Informatics & Health Outcomes Research Group, Department of Clinical & Experimental Medicine, University of Surrey, Guildford, Surrey, UK
| |
Collapse
|
26
|
Gonzalez-Hernandez G, Sarker A, O’Connor K, Savova G. Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text. Yearb Med Inform 2017; 26:214-227. [PMID: 29063568 PMCID: PMC6250990 DOI: 10.15265/iy-2017-029] [Citation(s) in RCA: 63] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background: Natural Language Processing (NLP) methods are increasingly being utilized to mine knowledge from unstructured health-related texts. Recent advances in noisy text processing techniques are enabling researchers and medical domain experts to go beyond the information encapsulated in published texts (e.g., clinical trials and systematic reviews) and structured questionnaires, and obtain perspectives from other unstructured sources such as Electronic Health Records (EHRs) and social media posts. Objectives: To review the recently published literature discussing the application of NLP techniques for mining health-related information from EHRs and social media posts. Methods: Literature review included the research published over the last five years based on searches of PubMed, conference proceedings, and the ACM Digital Library, as well as on relevant publications referenced in papers. We particularly focused on the techniques employed on EHRs and social media data. Results: A set of 62 studies involving EHRs and 87 studies involving social media matched our criteria and were included in this paper. We present the purposes of these studies, outline the key NLP contributions, and discuss the general trends observed in the field, the current state of research, and important outstanding problems. Conclusions: Over the recent years, there has been a continuing transition from lexical and rule-based systems to learning-based approaches, because of the growth of annotated data sets and advances in data science. For EHRs, publicly available annotated data is still scarce and this acts as an obstacle to research progress. On the contrary, research on social media mining has seen a rapid growth, particularly because the large amount of unlabeled data available via this resource compensates for the uncertainty inherent to the data. Effective mechanisms to filter out noise and for mapping social media expressions to standard medical concepts are crucial and latent research problems. Shared tasks and other competitive challenges have been driving factors behind the implementation of open systems, and they are likely to play an imperative role in the development of future systems.
Collapse
Affiliation(s)
- G. Gonzalez-Hernandez
- Department of Epidemiology, Biostatistics, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - A. Sarker
- Department of Epidemiology, Biostatistics, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - K. O’Connor
- Department of Epidemiology, Biostatistics, and Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - G. Savova
- Boston Children’s Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
27
|
Buchan K, Filannino M, Uzuner Ö. Automatic prediction of coronary artery disease from clinical narratives. J Biomed Inform 2017; 72:23-32. [PMID: 28663072 DOI: 10.1016/j.jbi.2017.06.019] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2016] [Revised: 06/19/2017] [Accepted: 06/22/2017] [Indexed: 11/25/2022]
Abstract
Coronary Artery Disease (CAD) is not only the most common form of heart disease, but also the leading cause of death in both men and women (Coronary Artery Disease: MedlinePlus, 2015). We present a system that is able to automatically predict whether patients develop coronary artery disease based on their narrative medical histories, i.e., clinical free text. Although the free text in medical records has been used in several studies for identifying risk factors of coronary artery disease, to the best of our knowledge our work marks the first attempt at automatically predicting development of CAD. We tackle this task on a small corpus of diabetic patients. The size of this corpus makes it important to limit the number of features in order to avoid overfitting. We propose an ontology-guided approach to feature extraction, and compare it with two classic feature selection techniques. Our system achieves state-of-the-art performance of 77.4% F1 score.
Collapse
Affiliation(s)
- Kevin Buchan
- Department of Information Science, State University of New York at Albany, NY, USA.
| | - Michele Filannino
- Department of Computer Science, State University of New York at Albany, NY, USA
| | - Özlem Uzuner
- Department of Computer Science, State University of New York at Albany, NY, USA
| |
Collapse
|
28
|
Ross EG, Shah NH, Dalman RL, Nead KT, Cooke JP, Leeper NJ. The use of machine learning for the identification of peripheral artery disease and future mortality risk. J Vasc Surg 2016; 64:1515-1522.e3. [PMID: 27266594 PMCID: PMC5079774 DOI: 10.1016/j.jvs.2016.04.026] [Citation(s) in RCA: 77] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 04/04/2016] [Indexed: 12/16/2022]
Abstract
OBJECTIVE A key aspect of the precision medicine effort is the development of informatics tools that can analyze and interpret "big data" sets in an automated and adaptive fashion while providing accurate and actionable clinical information. The aims of this study were to develop machine learning algorithms for the identification of disease and the prognostication of mortality risk and to determine whether such models perform better than classical statistical analyses. METHODS Focusing on peripheral artery disease (PAD), patient data were derived from a prospective, observational study of 1755 patients who presented for elective coronary angiography. We employed multiple supervised machine learning algorithms and used diverse clinical, demographic, imaging, and genomic information in a hypothesis-free manner to build models that could identify patients with PAD and predict future mortality. Comparison was made to standard stepwise linear regression models. RESULTS Our machine-learned models outperformed stepwise logistic regression models both for the identification of patients with PAD (area under the curve, 0.87 vs 0.76, respectively; P = .03) and for the prediction of future mortality (area under the curve, 0.76 vs 0.65, respectively; P = .10). Both machine-learned models were markedly better calibrated than the stepwise logistic regression models, thus providing more accurate disease and mortality risk estimates. CONCLUSIONS Machine learning approaches can produce more accurate disease classification and prediction models. These tools may prove clinically useful for the automated identification of patients with highly morbid diseases for which aggressive risk factor management can improve outcomes.
Collapse
Affiliation(s)
- Elsie Gyang Ross
- Division of Vascular Surgery, Stanford Health Care, Stanford, Calif
| | - Nigam H Shah
- Center for Biomedical Informatics Research, Stanford University, Stanford, Calif
| | - Ronald L Dalman
- Division of Vascular Surgery, Stanford Health Care, Stanford, Calif
| | - Kevin T Nead
- Department of Radiation Oncology, University of Pennsylvania, Philadelphia, Pa
| | - John P Cooke
- Department of Cardiovascular Sciences, Houston Methodist Research Institute, Houston, Tex; Center for Cardiovascular Regeneration, Houston Methodist DeBakey Heart and Vascular Center, Houston, Tex
| | - Nicholas J Leeper
- Division of Vascular Surgery, Stanford Health Care, Stanford, Calif.
| |
Collapse
|
29
|
Utilizing Chinese Admission Records for MACE Prediction of Acute Coronary Syndrome. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2016; 13:ijerph13090912. [PMID: 27649220 PMCID: PMC5036745 DOI: 10.3390/ijerph13090912] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Revised: 08/09/2016] [Accepted: 08/31/2016] [Indexed: 11/18/2022]
Abstract
Background: Clinical major adverse cardiovascular event (MACE) prediction of acute coronary syndrome (ACS) is important for a number of applications including physician decision support, quality of care assessment, and efficient healthcare service delivery on ACS patients. Admission records, as typical media to contain clinical information of patients at the early stage of their hospitalizations, provide significant potential to be explored for MACE prediction in a proactive manner. Methods: We propose a hybrid approach for MACE prediction by utilizing a large volume of admission records. Firstly, both a rule-based medical language processing method and a machine learning method (i.e., Conditional Random Fields (CRFs)) are developed to extract essential patient features from unstructured admission records. After that, state-of-the-art supervised machine learning algorithms are applied to construct MACE prediction models from data. Results: We comparatively evaluate the performance of the proposed approach on a real clinical dataset consisting of 2930 ACS patient samples collected from a Chinese hospital. Our best model achieved 72% AUC in MACE prediction. In comparison of the performance between our models and two well-known ACS risk score tools, i.e., GRACE and TIMI, our learned models obtain better performances with a significant margin. Conclusions: Experimental results reveal that our approach can obtain competitive performance in MACE prediction. The comparison of classifiers indicates the proposed approach has a competitive generality with datasets extracted by different feature extraction methods. Furthermore, our MACE prediction model obtained a significant improvement by comparison with both GRACE and TIMI. It indicates that using admission records can effectively provide MACE prediction service for ACS patients at the early stage of their hospitalizations.
Collapse
|
30
|
Towards Interactive Medical Content Delivery Between Simulated Body Sensor Networks and Practical Data Center. J Med Syst 2016; 40:214. [DOI: 10.1007/s10916-016-0575-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Accepted: 08/11/2016] [Indexed: 11/26/2022]
|
31
|
Kumar V, Stubbs A, Shaw S, Uzuner Ö. Creation of a new longitudinal corpus of clinical narratives. J Biomed Inform 2015; 58 Suppl:S6-S10. [PMID: 26433122 PMCID: PMC4978168 DOI: 10.1016/j.jbi.2015.09.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2015] [Revised: 09/22/2015] [Accepted: 09/23/2015] [Indexed: 10/23/2022]
Abstract
The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured a new longitudinal corpus of 1304 records representing 296 diabetic patients. The corpus contains three cohorts: patients who have a diagnosis of coronary artery disease (CAD) in their first record, and continue to have it in subsequent records; patients who do not have a diagnosis of CAD in the first record, but develop it by the last record; patients who do not have a diagnosis of CAD in any record. This paper details the process used to select records for this corpus and provides an overview of novel research uses for this corpus. This corpus is the only annotated corpus of longitudinal clinical narratives currently available for research to the general research community.
Collapse
Affiliation(s)
- Vishesh Kumar
- Dartmouth-Hitchcock Medical Center, Division of Cardiology, Lebanon, NH, USA
| | - Amber Stubbs
- School of Library and Information Science, Simmons College, Boston, MA, USA.
| | - Stanley Shaw
- Harvard Medical School, Boston, MA 02115, USA; Center for Systems Biology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Özlem Uzuner
- Department of Information Studies, State University of New York at Albany, Albany, NY, USA
| |
Collapse
|
32
|
Uzuner Ö, Stubbs A. Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks. J Biomed Inform 2015; 58 Suppl:S1-S5. [PMID: 26515500 PMCID: PMC4978169 DOI: 10.1016/j.jbi.2015.10.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Revised: 10/08/2015] [Accepted: 10/14/2015] [Indexed: 12/29/2022]
Affiliation(s)
- Özlem Uzuner
- Department of Information Studies, State University of New York at Albany, Albany, NY, USA.
| | - Amber Stubbs
- School of Library and Information Science, Simmons College, Boston, MA, USA.
| |
Collapse
|
33
|
Identification and Progression of Heart Disease Risk Factors in Diabetic Patients from Longitudinal Electronic Health Records. BIOMED RESEARCH INTERNATIONAL 2015; 2015:636371. [PMID: 26380290 PMCID: PMC4561944 DOI: 10.1155/2015/636371] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Revised: 07/07/2015] [Accepted: 07/08/2015] [Indexed: 11/17/2022]
Abstract
Heart disease is the leading cause of death worldwide. Therefore, assessing the risk of its occurrence is a crucial step in predicting serious cardiac events. Identifying heart disease risk factors and tracking their progression is a preliminary step in heart disease risk assessment. A large number of studies have reported the use of risk factor data collected prospectively. Electronic health record systems are a great resource of the required risk factor data. Unfortunately, most of the valuable information on risk factor data is buried in the form of unstructured clinical notes in electronic health records. In this study, we present an information extraction system to extract related information on heart disease risk factors from unstructured clinical notes using a hybrid approach. The hybrid approach employs both machine learning and rule-based clinical text mining techniques. The developed system achieved an overall microaveraged F-score of 0.8302.
Collapse
|