1
|
Berloco F, Ciavarella S, Colucci S, Grieco LA, Guarini A, Zaccaria GM. ARGO 2.0: a Hybrid NLP/ML Framework for Diagnosis Standardization. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2023; 2023:1-4. [PMID: 38083100 DOI: 10.1109/embc40787.2023.10340022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2023]
Abstract
A relevant problem in medicine is the standardization of the diagnosis associated with a clinical case. Although diagnosis formulation is an intrinsically subjective and uncertain process, its standardization may take benefit from digital solutions automating the routines at the basis of such a decision. In this work, we propose ARGO 2.0: a framework for the development of decision support systems for diagnosis formulation. The framework can read free-text reports and store their clinically relevant information as personalized electronic Case Report Forms. A hybrid strategy, exploiting the synergy of Natural Language Processing and Machine Learning techniques, is used to automatically suggest a diagnosis in a standardized fashion. ARGO 2.0 has been designed to be template-independent and easily tailored to specific medical fields. We here demonstrate its feasibility in hemo lympho-pathology, by detailing its implementation, object of an ongoing validation campaign in a standing medical institute. ARGO 2.0 achieved an average Accuracy of 95.07%, an average precision of 94.85%, an average Recall of 96.31% and a F-Score of 95.32% onto the test set, outperforming both its embedded components, based on Natural Language Processing and Machine Learning.
Collapse
|
2
|
Lin FPY, Salih OS, Scott N, Jameson MB, Epstein RJ. Development and Validation of a Machine Learning Approach Leveraging Real-World Clinical Narratives as a Predictor of Survival in Advanced Cancer. JCO Clin Cancer Inform 2022; 6:e2200064. [DOI: 10.1200/cci.22.00064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
PURPOSE Predicting short-term mortality in patients with advanced cancer remains challenging. Whether digitalized clinical text can be used to build models to enhance survival prediction in this population is unclear. MATERIALS AND METHODS We conducted a single-centered retrospective cohort study in patients with advanced solid tumors. Clinical correspondence authored by oncologists at the first patient encounter was extracted from the electronic medical records. Machine learning (ML) models were trained using narratives from the derivation cohort, before being tested on a temporal validation cohort at the same site. Performance was benchmarked against Eastern Cooperative Oncology Group performance status (PS), comparing ML models alone (comparison 1) or in combination with PS (comparison 2), assessed by areas under receiver operating characteristic curves (AUCs) for predicting vital status at 11 time points from 2 to 52 weeks. RESULTS ML models were built on the derivation cohort (4,791 patients from 2001 to April 2017) and tested on the validation cohort of 726 patients (May 2017-June 2019). In 441 patients (61%) where clinical narratives were available and PS was documented, ML models outperformed the predictivity of PS (mean AUC improvement, 0.039, P < .001, comparison 1). Inclusion of both clinical text and PS in ML models resulted in further improvement in prediction accuracy over PS with a mean AUC improvement of 0.050 ( P < .001, comparison 2); the AUC was > 0.80 at all assessed time points for models incorporating clinical text. Exploratory analysis of oncologist's narratives revealed recurring descriptors correlating with survival, including referral patterns, mobility, physical functions, and concomitant medications. CONCLUSION Applying ML to oncologists' narratives with or without including patient's PS significantly improved survival prediction to 12 months, suggesting the utility of clinical text in building prognostic support tools.
Collapse
Affiliation(s)
- Frank Po-Yen Lin
- Kinghorn Centre for Clinical Genomics, Garvan Institute of Medical Research, Darlinghurst, Australia
- NHMRC Clinical Trials Centre, Sydney University, Camperdown, Australia
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- School of Clinical Medicine, University of New South Wales, Sydney, Australia
| | - Osama S.M. Salih
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- Auckland City Hospital, Auckland, New Zealand
| | - Nina Scott
- Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand
| | - Michael B. Jameson
- Department of Medical Oncology, Waikato Hospital, Hamilton, New Zealand
- Waikato Clinical Campus, University of Auckland, Hamilton, New Zealand
| | - Richard J. Epstein
- School of Clinical Medicine, University of New South Wales, Sydney, Australia
- Cancer Research Division, Garvan Institute of Medical Research, Sydney, Australia
- New Hope Cancer Centre, Beijing United Hospital, Beijing, China
| |
Collapse
|
3
|
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Michelle Mai
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Irbaz B. Riaz
- Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ
| | - Nan Wang
- Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Jeremy L. Warner
- Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN
- Department Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| |
Collapse
|
4
|
Zaccaria GM, Colella V, Colucci S, Clemente F, Pavone F, Vegliante MC, Esposito F, Opinto G, Scattone A, Loseto G, Minoia C, Rossini B, Quinto AM, Angiulli V, Grieco LA, Fama A, Ferrero S, Moia R, Di Rocco A, Quaglia FM, Tabanelli V, Guarini A, Ciavarella S. Electronic case report forms generation from pathology reports by ARGO, automatic record generator for onco-hematology. Sci Rep 2021; 11:23823. [PMID: 34893665 PMCID: PMC8664934 DOI: 10.1038/s41598-021-03204-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Accepted: 11/23/2021] [Indexed: 12/04/2022] Open
Abstract
The unstructured nature of Real-World (RW) data from onco-hematological patients and the scarce accessibility to integrated systems restrain the use of RW information for research purposes. Natural Language Processing (NLP) might help in transposing unstructured reports into standardized electronic health records. We exploited NLP to develop an automated tool, named ARGO (Automatic Record Generator for Onco-hematology) to recognize information from pathology reports and populate electronic case report forms (eCRFs) pre-implemented by REDCap. ARGO was applied to hemo-lymphopathology reports of diffuse large B-cell, follicular, and mantle cell lymphomas, and assessed for accuracy (A), precision (P), recall (R) and F1-score (F) on internal (n = 239) and external (n = 93) report series. 326 (98.2%) reports were converted into corresponding eCRFs. Overall, ARGO showed high performance in capturing (1) identification report number (all metrics > 90%), (2) biopsy date (all metrics > 90% in both series), (3) specimen type (86.6% and 91.4% of A, 98.5% and 100.0% of P, 92.5% and 95.5% of F, and 87.2% and 91.4% of R for internal and external series, respectively), (4) diagnosis (100% of P with A, R and F of 90% in both series). We developed and validated a generalizable tool that generates structured eCRFs from real-life pathology reports.
Collapse
Affiliation(s)
- Gian Maria Zaccaria
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.
| | - Vito Colella
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Simona Colucci
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Felice Clemente
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Fabio Pavone
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Maria Carmela Vegliante
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Flavia Esposito
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy.,Department of Mathematics, University of Bari Aldo Moro, Bari, Italy
| | - Giuseppina Opinto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Anna Scattone
- Pathology Department, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Giacomo Loseto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Carla Minoia
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Bernardo Rossini
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Angela Maria Quinto
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Vito Angiulli
- Clinical Engineering Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Bari, Italy
| | - Luigi Alfredo Grieco
- Department of Electrical and Information Engineering, Politecnico of Bari, Bari, Italy
| | - Angelo Fama
- Hematology, Azienda USL - IRCCS Di Reggio Emilia, Reggio Emilia, Italy
| | - Simone Ferrero
- Division of Hematology 1, AOU "Città Della Salute e Della Scienza di Torino", Torino, Italy.,Department of Molecular Biotechnologies and Health Sciences, University of Torino, Torino, Italy
| | - Riccardo Moia
- Division of Hematology, Azienda Ospedaliero-Universitaria Maggiore Della Carità Di Novara, Novara, Italy
| | - Alice Di Rocco
- Unit of Hematology, Azienda Ospedaliero-Universitaria Policlinico Umberto I, Roma, Italy
| | | | - Valentina Tabanelli
- Division of Diagnostic Haematopathology, European Institute of Oncology, IRCCS, Milano, Italy
| | - Attilio Guarini
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| | - Sabino Ciavarella
- Hematology and Cell Therapy Unit, IRCCS Istituto Tumori 'Giovanni Paolo II', Viale Orazio Flacco, 65, Bari, Italy
| |
Collapse
|
5
|
Karlsson A, Ellonen A, Irjala H, Väliaho V, Mattila K, Nissi L, Kytö E, Kurki S, Ristamäki R, Vihinen P, Laitinen T, Ålgars A, Jyrkkiö S, Minn H, Heervä E. Impact of deep learning-determined smoking status on mortality of cancer patients: never too late to quit. ESMO Open 2021; 6:100175. [PMID: 34091262 PMCID: PMC8182259 DOI: 10.1016/j.esmoop.2021.100175] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Revised: 05/12/2021] [Accepted: 05/13/2021] [Indexed: 12/22/2022] Open
Abstract
Background Persistent smoking after cancer diagnosis is associated with increased overall mortality (OM) and cancer mortality (CM). According to the 2020 Surgeon General's report, smoking cessation may reduce CM but supporting evidence is not wide. Use of deep learning-based modeling that enables universal natural language processing of medical narratives to acquire population-based real-life smoking data may help overcome the challenge. We assessed the effect of smoking status and within-1-year smoking cessation on CM by an in-house adapted freely available language processing algorithm. Materials and methods This cross-sectional real-world study included 29 823 patients diagnosed with cancer in 2009-2018 in Southwest Finland. The medical narrative, International Classification of Diseases-10th edition codes, histology, cancer treatment records, and death certificates were combined. Over 162 000 sentences describing tobacco smoking behavior were analyzed with ULMFiT and BERT algorithms. Results The language model classified the smoking status of 23 031 patients. Recent quitters had reduced CM [hazard ratio (HR) 0.80 (0.74-0.87)] and OM [HR 0.78 (0.72-0.84)] compared to persistent smokers. Compared to never smokers, persistent smokers had increased CM in head and neck, gastro-esophageal, pancreatic, lung, prostate, and breast cancer and Hodgkin's lymphoma, irrespective of age, comorbidities, performance status, or presence of metastatic disease. Increased CM was also observed in smokers with colorectal cancer, men with melanoma or bladder cancer, and lymphoid and myeloid leukemia, but no longer independently of the abovementioned covariates. Specificity and sensitivity were 96%/96%, 98%/68%, and 88%/99% for never, former, and current smokers, respectively, being essentially the same with both models. Conclusions Deep learning can be used to classify large amounts of smoking data from the medical narrative with good accuracy. The results highlight the detrimental effects of persistent smoking in oncologic patients and emphasize that smoking cessation should always be an essential element of patient counseling. Deep learning/universal language modeling was used to extract smoking status of cancer patients. Good accuracy was observed. Those who continue smoking after cancer diagnosis had increased CM compared to never smokers. Recent within-1-year cessation reduced this mortality. Detrimental effects of smoking were observed in multiple types of early- and advanced-stage cancers, including the elderly. We conclude that smoking cessation support should always be included in cancer care.
Collapse
Affiliation(s)
- A Karlsson
- Auria Biobank, University of Turku and Turku University Hospital, Turku, Finland
| | - A Ellonen
- University of Turku, Turku, Finland; Department of Oncology, Turku University Hospital, Turku, Finland; FICAN West Cancer Centre, Turku, Finland
| | - H Irjala
- University of Turku, Turku, Finland; FICAN West Cancer Centre, Turku, Finland; Department of Otorhinolaryngology-Head and Neck Surgery, Turku University Hospital, Turku, Finland
| | - V Väliaho
- Department of Oncology, Turku University Hospital, Turku, Finland; FICAN West Cancer Centre, Turku, Finland
| | - K Mattila
- Department of Oncology, Turku University Hospital, Turku, Finland; FICAN West Cancer Centre, Turku, Finland
| | - L Nissi
- University of Turku, Turku, Finland; Department of Oncology, Turku University Hospital, Turku, Finland
| | - E Kytö
- University of Turku, Turku, Finland; FICAN West Cancer Centre, Turku, Finland; Department of Otorhinolaryngology-Head and Neck Surgery, Turku University Hospital, Turku, Finland
| | - S Kurki
- Auria Biobank, University of Turku and Turku University Hospital, Turku, Finland; University of Turku, Turku, Finland
| | - R Ristamäki
- Department of Oncology, Turku University Hospital, Turku, Finland; FICAN West Cancer Centre, Turku, Finland
| | - P Vihinen
- FICAN West Cancer Centre, Turku, Finland
| | - T Laitinen
- Hospital Administration, Tampere University Hospital, Tampere, Finland
| | - A Ålgars
- University of Turku, Turku, Finland; Department of Oncology, Turku University Hospital, Turku, Finland; FICAN West Cancer Centre, Turku, Finland
| | - S Jyrkkiö
- Department of Oncology, Turku University Hospital, Turku, Finland; FICAN West Cancer Centre, Turku, Finland
| | - H Minn
- University of Turku, Turku, Finland; Department of Oncology, Turku University Hospital, Turku, Finland; FICAN West Cancer Centre, Turku, Finland
| | - E Heervä
- University of Turku, Turku, Finland; Department of Oncology, Turku University Hospital, Turku, Finland; FICAN West Cancer Centre, Turku, Finland.
| |
Collapse
|
6
|
Huang HL, Hong SH, Tsai YC. Approaches to text mining for analyzing treatment plan of quit smoking with free-text medical records: A PRISMA-compliant meta-analysis. Medicine (Baltimore) 2020; 99:e20999. [PMID: 32702841 PMCID: PMC7373589 DOI: 10.1097/md.0000000000020999] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
BACKGROUND Smoking is a complex behavior associated with multiple factors such as personality, environment, genetics, and emotions. Text data are a rich source of information. However, pure text data requires substantial human resources and time to extract and apply the knowledge, resulting in many details not being discovered and used. This study proposes a novel approach that explores a text mining flow to capture the behavior of smokers quitting tobacco from their free-text medical records. More importantly, the paper examines the impact of these changes on smokers. The goal is to help smokers quit smoking. The study population included adult patients that were >20 years old of age who consulted the medical center's smoking cessation outpatient clinic from January to December 2016. A total of 246 patients visited the clinic in the study period. After excluding incomplete medical records or lost follow up, there were 141 patients included in the final analysis. There are 141 valid data points for patients who only treated once and patients with empty medical records. Two independent review authors will make the study selection based on the study eligibility criteria. Our participants are from all the patients that were involved in this study and the staff of Division of Family Medicine, National Taiwan University Hospital. Interventions and study appraisal are not required. METHODS The paper develops an algorithm for analyzing smoking cessation treatment plans documented in free-text medical records. The approach involves the development of an information extraction flow that uses a combination of data mining techniques, including text mining. It can use not only to help others quit smoking but also for other medical records with similar data elements. The Apriori associations of our algorithm from the text mining revealed several important clinical implications for physicians during smoking cessation. For example, an apparent association between nicotine replacement therapy (NRT) and other medications such as Inderal, Rivotril, Dogmatyl, and Solaxin. Inderal and Rivotril use in patients with anxiety disorders as anxiolytics frequently. RESULTS Finally, we find that the rules associating with NRT combination with blood tests may imply that the use of NRT combination therapy in smokers with chronic illness may result in lower abstinence. Further large-scale surveys comparing varenicline or bupropion with NRT combination in smokers with a chronic disease are warranted. The Apriori algorithm suffers from some weaknesses despite being transparent and straightforward. The main limitation is the costly wasting of time to hold a vast number of candidates sets with frequent itemsets, low minimum support, or large itemsets. CONCLUSION In the paper, the most visible areas for the therapeutic application of text mining are the integration and transfer of advances made in basic sciences, as well as a better understanding of the processes involved in smoking cessation. Text mining may also be useful for supporting decision-making processes associated with smoking cessation. Systematic review registration number is not registered.
Collapse
Affiliation(s)
- Hsien-Liang Huang
- Division of Family Medicine, National Taiwan University Hospital, Zhongzheng Dist
| | - Shi-Hao Hong
- Computer Science and Technology, HeFei University of Technology, Hefei, Anhui Province
| | - Yun-Cheng Tsai
- School of Big Data Management, Soochow University, Shihlin District, Taipei City, Taiwan (R.O.C.)
| |
Collapse
|
7
|
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019; 7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 204] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open
Abstract
Background Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. Objective The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. Methods Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using “clinical notes,” “natural language processing,” and “chronic disease” and their variations as keywords to maximize coverage of the articles. Results Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. Conclusions Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
Collapse
Affiliation(s)
- Seyedmostafa Sheikhalishahi
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy.,Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
| | - Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Alberto Lavelli
- NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Venet Osmani
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| |
Collapse
|
8
|
Abstract
Text fields in electronic medical records (EMR) contain information on important factors that influence health outcomes, however, they are underutilized in clinical decision making due to their unstructured nature. We analyzed 6497 inpatient surgical cases with 719,308 free text notes from Le Bonheur Children’s Hospital EMR. We used a text mining approach on preoperative notes to obtain a text-based risk score to predict death within 30 days of surgery. In addition, we evaluated the performance of a hybrid model that included the text-based risk score along with structured data pertaining to clinical risk factors. The C-statistic of a logistic regression model with five-fold cross-validation significantly improved from 0.76 to 0.92 when text-based risk scores were included in addition to structured data. We conclude that preoperative free text notes in EMR include significant information that can predict adverse surgery outcomes.
Collapse
|
9
|
McNutt TR, Bowers M, Cheng Z, Han P, Hui X, Moore J, Robertson S, Mayo C, Voong R, Quon H. Practical data collection and extraction for big data applications in radiotherapy. Med Phys 2018; 45:e863-e869. [DOI: 10.1002/mp.12817] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2017] [Revised: 01/15/2018] [Accepted: 01/25/2018] [Indexed: 12/25/2022] Open
Affiliation(s)
- Todd R. McNutt
- School of Medicine; Radiation Oncology; Johns Hopkins University; Baltimore MD 21231 USA
| | - Michael Bowers
- School of Medicine; Radiation Oncology; Johns Hopkins University; Baltimore MD 21231 USA
| | - Zhi Cheng
- School of Medicine; Radiation Oncology; Johns Hopkins University; Baltimore MD 21231 USA
| | - Peijin Han
- School of Medicine; Radiation Oncology; Johns Hopkins University; Baltimore MD 21231 USA
| | - Xuan Hui
- Epidemiology; University of Chicago; Chicago IL 60637 USA
| | - Joseph Moore
- School of Medicine; Radiation Oncology; Johns Hopkins University; Baltimore MD 21231 USA
| | - Scott Robertson
- Radiation Oncology; Wellspan York Hospital; York PA 17403 USA
| | - Charles Mayo
- Radiation Oncology; University of Michigan; Ann Arbor MI 48109 USA
| | - Ranh Voong
- School of Medicine; Radiation Oncology; Johns Hopkins University; Baltimore MD 21231 USA
| | - Harry Quon
- School of Medicine; Radiation Oncology; Johns Hopkins University; Baltimore MD 21231 USA
| |
Collapse
|
10
|
Gonzalez-Hernandez G, Sarker A, O’Connor K, Greene C. Advances in Text Mining and Visualization for Precision Medicine. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:559-565. [PMID: 29218914 PMCID: PMC7466870] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
According to the National Institutes of Health (NIH), precision medicine is "an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person." Although the text mining community has explored this realm for some years, the official endorsement and funding launched in 2015 with the Precision Medicine Initiative are beginning to bear fruit. This session sought to elicit participation of researchers with strong background in text mining and/or visualization who are actively collaborating with bench scientists and clinicians for the deployment of integrative approaches in precision medicine that could impact scientific discovery and advance the vision of precision medicine as a universal, accessible approach at the point of care.
Collapse
|