1
|
Guével E, Priou S, Flicoteaux R, Lamé G, Bey R, Tannier X, Cohen A, Chatellier G, Daniel C, Tournigand C, Kempf E. Development of a natural language processing model for deriving breast cancer quality indicators : A cross-sectional, multicenter study. Rev Epidemiol Sante Publique 2023; 71:102189. [PMID: 37972522 DOI: 10.1016/j.respe.2023.102189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2023] [Revised: 10/13/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open
Abstract
OBJECTIVES Medico-administrative data are promising to automate the calculation of Healthcare Quality and Safety Indicators. Nevertheless, not all relevant indicators can be calculated with this data alone. Our feasibility study objective is to analyze 1) the availability of data sources; 2) the availability of each indicator elementary variables, and 3) to apply natural language processing to automatically retrieve such information. METHOD We performed a multicenter cross-sectional observational feasibility study on the clinical data warehouse of Assistance Publique - Hôpitaux de Paris (AP-HP). We studied the management of breast cancer patients treated at AP-HP between January 2019 and June 2021, and the quality indicators published by the European Society of Breast Cancer Specialist, using claims data from the Programme de Médicalisation du Système d'Information (PMSI) and pathology reports. For each indicator, we calculated the number (%) of patients for whom all necessary data sources were available, and the number (%) of patients for whom all elementary variables were available in the sources, and for whom the related HQSI was computable. To extract useful data from the free text reports, we developed and validated dedicated rule-based algorithms, whose performance metrics were assessed with recall, precision, and f1-score. RESULTS Out of 5785 female patients diagnosed with a breast cancer (60.9 years, IQR [50.0-71.9]), 5,147 (89.0%) had procedures related to breast cancer recorded in the PMSI, and 3732 (72.5%) had at least one surgery. Out of the 34 key indicators, 9 could be calculated with the PMSI alone, and 6 others became so using the data from pathology reports. Ten elementary variables were needed to calculate the 6 indicators combining the PMSI and pathology reports. The necessary sources were available for 58.8% to 94.6% of patients, depending on the indicators. The extraction algorithms developed had an average accuracy of 76.5% (min-max [32.7%-93.3%]), an average precision of 77.7% [10.0%-97.4%] and an average sensitivity of 71.6% [2.8% to 100.0%]. Once these algorithms applied, the variables needed to calculate the indicators were extracted for 2% to 88% of patients, depending on the indicators. DISCUSSION The availability of medical reports in the electronic health records, of the elementary variables within the reports, and the performance of the extraction algorithms limit the population for which the indicators can be calculated. CONCLUSIONS The automated calculation of quality indicators from electronic health records is a prospect that comes up against many practical obstacles.
Collapse
Affiliation(s)
- Etienne Guével
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Sonia Priou
- Université Paris-Saclay, CentraleSupélec, Laboratoire Génie Industriel, 91192 Gif-sur-Yvette, France
| | - Rémi Flicoteaux
- Assistance Publique - Hôpitaux de Paris, Department of medical information, 75012 Paris, France
| | - Guillaume Lamé
- Université Paris-Saclay, CentraleSupélec, Laboratoire Génie Industriel, 91192 Gif-sur-Yvette, France
| | - Romain Bey
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Xavier Tannier
- Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, 75006 Paris, France
| | - Ariel Cohen
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Gilles Chatellier
- Université Paris CIté, Department of medical informatics, Assistance Publique Hôpitaux de Paris, Centre-Université de Paris (APHP-CUP), 75015 Paris, France
| | - Christel Daniel
- Assistance Publique - Hôpitaux de Paris, Innovation and Data, IT Department, 75012 Paris, France
| | - Christophe Tournigand
- Université Paris Est Créteil, Assistance Publique - Hôpitaux de Paris, Department of medical oncology, Henri Mondor and Albert Chenevier University Hospital, 94000 Créteil, France
| | - Emmanuelle Kempf
- Université Sorbonne Paris Nord, Laboratoire d'Informatique Médicale et d'Ingénierie des Connaissances pour la e-Santé, LIMICS, 75006 Paris, France; Université Paris Est Créteil, Assistance Publique - Hôpitaux de Paris, Department of medical oncology, Henri Mondor and Albert Chenevier University Hospital, 94000 Créteil, France.
| |
Collapse
|
2
|
Seong D, Choi YH, Shin SY, Yi BK. Deep learning approach to detection of colonoscopic information from unstructured reports. BMC Med Inform Decis Mak 2023; 23:28. [PMID: 36750932 PMCID: PMC9903463 DOI: 10.1186/s12911-023-02121-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2022] [Accepted: 01/23/2023] [Indexed: 02/09/2023] Open
Abstract
BACKGROUND Colorectal cancer is a leading cause of cancer deaths. Several screening tests, such as colonoscopy, can be used to find polyps or colorectal cancer. Colonoscopy reports are often written in unstructured narrative text. The information embedded in the reports can be used for various purposes, including colorectal cancer risk prediction, follow-up recommendation, and quality measurement. However, the availability and accessibility of unstructured text data are still insufficient despite the large amounts of accumulated data. We aimed to develop and apply deep learning-based natural language processing (NLP) methods to detect colonoscopic information. METHODS This study applied several deep learning-based NLP models to colonoscopy reports. Approximately 280,668 colonoscopy reports were extracted from the clinical data warehouse of Samsung Medical Center. For 5,000 reports, procedural information and colonoscopic findings were manually annotated with 17 labels. We compared the long short-term memory (LSTM) and BioBERT model to select the one with the best performance for colonoscopy reports, which was the bidirectional LSTM with conditional random fields. Then, we applied pre-trained word embedding using large unlabeled data (280,668 reports) to the selected model. RESULTS The NLP model with pre-trained word embedding performed better for most labels than the model with one-hot encoding. The F1 scores for colonoscopic findings were: 0.9564 for lesions, 0.9722 for locations, 0.9809 for shapes, 0.9720 for colors, 0.9862 for sizes, and 0.9717 for numbers. CONCLUSIONS This study applied deep learning-based clinical NLP models to extract meaningful information from colonoscopy reports. The method in this study achieved promising results that demonstrate it can be applied to various practical purposes.
Collapse
Affiliation(s)
- Donghyeong Seong
- grid.264381.a0000 0001 2181 989XSamsung Advanced Institute for Health Sciences and Technology (SAIHST), Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Yoon Ho Choi
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea
| | - Soo-Yong Shin
- grid.264381.a0000 0001 2181 989XDepartment of Digital Health, SAIHST, Sungkyunkwan University, Seoul, 06355 Republic of Korea ,grid.414964.a0000 0001 0640 5613Research Institute for Future Medicine, Samsung Medical Center, Seoul, 06351 Republic of Korea
| | - Byoung-Kee Yi
- Department of Artificial Intelligence Convergence, Kangwon National University, 1 Kangwondaehak-Gil, Chuncheon-si, Gangwon-do, 24341, Republic of Korea.
| |
Collapse
|
3
|
Rex DK. Key quality indicators in colonoscopy. Gastroenterol Rep (Oxf) 2023; 11:goad009. [PMID: 36911141 PMCID: PMC10005623 DOI: 10.1093/gastro/goad009] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 01/05/2023] [Indexed: 03/12/2023] Open
Abstract
Many quality indicators have been proposed for colonoscopy, but most colonoscopists and endoscopy groups focus on measuring the adenoma detection rate and the cecal intubation rate. Use of proper screening and surveillance intervals is another accepted key indicator but it is seldom evaluated in clinical practice. Bowel preparation efficacy and polyp resection skills are areas that are emerging as potential key or priority indicators. This review summarizes and provides an update on key performance indicators for colonoscopy quality.
Collapse
Affiliation(s)
- Douglas K Rex
- Division of Gastroenterology/Hepatology, Indiana University School of Medicine, Indianapolis, IN, USA
| |
Collapse
|
4
|
Benson R, Winterton C, Winn M, Krick B, Liu M, Abu-el-rub N, Conway M, Del Fiol G, Gawron A, Hardikar S. Leveraging Natural Language Processing to Extract Features of Colorectal Polyps From Pathology Reports for Epidemiologic Study. JCO Clin Cancer Inform 2023; 7:e2200131. [PMID: 36753686 PMCID: PMC10166420 DOI: 10.1200/cci.22.00131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/06/2022] [Indexed: 02/10/2023] Open
Abstract
PURPOSE Histopathologic features are critical for studying risk factors of colorectal polyps, but remain deeply embedded within unstructured pathology reports, requiring costly and time-consuming manual abstraction for research. In this study, we developed and evaluated a natural language processing (NLP) pipeline to automatically extract histopathologic features of colorectal polyps from pathology reports, with an emphasis on individual polyp size. These data were then linked with structured electronic health record (EHR) data, creating an analysis-ready epidemiologic data set. METHODS We obtained 24,584 pathology reports from colonoscopies performed at the University of Utah's Gastroenterology Clinic. Two investigators annotated 350 reports to determine inter-rater agreement, develop an annotation scheme, and create a reference standard for performance evaluation. The pipeline was then developed, and performance was compared against the reference for extracting polyp location, histology, size, shape, dysplasia, and the number of polyps. Finally, the pipeline was applied to 24,225 unseen reports and NLP-extracted data were linked with structured EHR data. RESULTS Across all features, our pipeline achieved a precision of 98.9%, a recall of 98.0%, and an F1-score of 98.4%. In patients with polyps, the pipeline correctly extracted 95.6% of sizes, 97.2% of polyp locations, 97.8% of histology, 98.3% of shapes, and 98.3% of dysplasia levels. When applied to unseen data, the pipeline classified 12,889 patients as having polyps, 4,907 patients without polyps, and extracted the features of 28,387 polyps. Tubular adenomas were the most common subtype (55.9%), 8.1% of polyps were advanced adenomas, and the mean polyp size was 0.57 (±0.4) cm. CONCLUSION Our pipeline extracted histopathologic features of colorectal polyps from colonoscopy pathology reports, most notably individual polyp sizes, with considerable accuracy. This study demonstrates the utility of NLP for extracting polyp features and linking these data with EHR data to create an epidemiologic data set to study colorectal polyp risk factors and outcomes.
Collapse
Affiliation(s)
- Ryzen Benson
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | | | - Maci Winn
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT
- Department of Population Health Sciences, University of Utah, Salt Lake City, UT
| | - Benjamin Krick
- Department of Political Science, Duke University, Durham, NC
| | - Mei Liu
- Deparment of Internal Medicine, University of Kansas Medical Center, Kansas City, KS
| | - Noor Abu-el-rub
- Deparment of Internal Medicine, University of Kansas Medical Center, Kansas City, KS
| | - Mike Conway
- School of Computing and Information Systems, University of Melbourne, Parkville, Victoria, Australia
| | - Guilherme Del Fiol
- Department of Biomedical Informatics, University of Utah, Salt Lake City, UT
| | - Andrew Gawron
- Salt Lake City VA Specialty Care Center of Innovation, University of Utah, Salt Lake City, UT
- Department of Internal Medicine, University of Utah, Salt Lake City, UT
| | - Sheetal Hardikar
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT
- Department of Population Health Sciences, University of Utah, Salt Lake City, UT
| |
Collapse
|
5
|
Binsfeld Gonçalves L, Nesic I, Obradovic M, Stieltjes B, Weikert T, Bremerich J. Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame. JMIR Med Inform 2022; 10:e40534. [PMID: 36542426 PMCID: PMC9813822 DOI: 10.2196/40534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/13/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND A concise visualization framework of related reports would increase readability and improve patient management. To this end, temporal referrals to prior comparative exams are an essential connection to previous exams in written reports. Due to unstructured narrative texts' variable structure and content, their extraction is hampered by poor computer readability. Natural language processing (NLP) permits the extraction of structured information from unstructured texts automatically and can serve as an essential input for such a novel visualization framework. OBJECTIVE This study proposes and evaluates an NLP-based algorithm capable of extracting the temporal referrals in written radiology reports, applies it to all the radiology reports generated for 10 years, introduces a graphical representation of imaging reports, and investigates its benefits for clinical and research purposes. METHODS In this single-center, university hospital, retrospective study, we developed a convolutional neural network capable of extracting the date of referrals from imaging reports. The model's performance was assessed by calculating precision, recall, and F1-score using an independent test set of 149 reports. Next, the algorithm was applied to our department's radiology reports generated from 2011 to 2021. Finally, the reports and their metadata were represented in a modulable graph. RESULTS For extracting the date of referrals, the named-entity recognition (NER) model had a high precision of 0.93, a recall of 0.95, and an F1-score of 0.94. A total of 1,684,635 reports were included in the analysis. Temporal reference was mentioned in 53.3% (656,852/1,684,635), explicitly stated as not available in 21.0% (258,386/1,684,635), and omitted in 25.7% (317,059/1,684,635) of the reports. Imaging records can be visualized in a directed and modulable graph, in which the referring links represent the connecting arrows. CONCLUSIONS Automatically extracting the date of referrals from unstructured radiology reports using deep learning NLP algorithms is feasible. Graphs refined the selection of distinct pathology pathways, facilitated the revelation of missing comparisons, and enabled the query of specific referring exam sequences. Further work is needed to evaluate its benefits in clinics, research, and resource planning.
Collapse
Affiliation(s)
- Laurent Binsfeld Gonçalves
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Ivan Nesic
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Marko Obradovic
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Bram Stieltjes
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Thomas Weikert
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Jens Bremerich
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| |
Collapse
|
6
|
Song G, Chung SJ, Seo JY, Yang SY, Jin EH, Chung GE, Shim SR, Sa S, Hong MS, Kim KH, Jang E, Lee CW, Bae JH, Han HW. Natural Language Processing for Information Extraction of Gastric Diseases and Its Application in Large-Scale Clinical Research. J Clin Med 2022; 11:jcm11112967. [PMID: 35683353 PMCID: PMC9181010 DOI: 10.3390/jcm11112967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Revised: 05/16/2022] [Accepted: 05/23/2022] [Indexed: 11/21/2022] Open
Abstract
Background and Aims: The utility of clinical information from esophagogastroduodenoscopy (EGD) reports has been limited because of its unstructured narrative format. We developed a natural language processing (NLP) pipeline that automatically extracts information about gastric diseases from unstructured EGD reports and demonstrated its applicability in clinical research. Methods: An NLP pipeline was developed using 2000 EGD and associated pathology reports that were retrieved from a single healthcare center. The pipeline extracted clinical information, including the presence, location, and size, for 10 gastric diseases from the EGD reports. It was validated with 1000 EGD reports by evaluating sensitivity, positive predictive value (PPV), accuracy, and F1 score. The pipeline was applied to 248,966 EGD reports from 2010–2019 to identify patient demographics and clinical information for 10 gastric diseases. Results: For gastritis information extraction, we achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.966, 0.972, 0.996, and 0.967, respectively. Other gastric diseases, such as ulcers, and neoplastic diseases achieved an overall sensitivity, PPV, accuracy, and F1 score of 0.975, 0.982, 0.999, and 0.978, respectively. The study of EGD data of over 10 years revealed the demographics of patients with gastric diseases by sex and age. In addition, the study identified the extent and locations of gastritis and other gastric diseases, respectively. Conclusions: We demonstrated the feasibility of the NLP pipeline providing an automated extraction of gastric disease information from EGD reports. Incorporating the pipeline can facilitate large-scale clinical research to better understand gastric diseases.
Collapse
Affiliation(s)
- Gyuseon Song
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Su Jin Chung
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Ji Yeon Seo
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Sun Young Yang
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Eun Hyo Jin
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Goh Eun Chung
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
| | - Sung Ryul Shim
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
- Department of Health and Medical Informatics, Kyungnam University College of Health Sciences, Changwon 51767, Korea
| | - Soonok Sa
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Moongi Simon Hong
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Kang Hyun Kim
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Eunchan Jang
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Chae Won Lee
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
| | - Jung Ho Bae
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
- Department of Internal Medicine and Healthcare Research Institute, Healthcare System Gangnam Center, Seoul National University Hospital, 39FL Gangnam Finance Center 152, Teheran-ro, Gangnam-gu, Seoul 06236, Korea; (S.J.C.); (J.Y.S.); (S.Y.Y.); (E.H.J.); (G.E.C.)
- Correspondence: (J.H.B.); (H.W.H.); Tel.: +82-2-2112-5574 (J.H.B.); +82-31-881-7109 (H.W.H.); Fax: +82-2-2112-5635 (J.H.B.); +82-31-881-7069 (H.W.H.)
| | - Hyun Wook Han
- Department of Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea; (G.S.); (S.R.S.); (S.S.); (M.S.H.); (K.H.K.); (E.J.); (C.W.L.)
- Institute for Biomedical Informatics, CHA University School of Medicine, CHA University, Seongnam 13488, Korea
- Correspondence: (J.H.B.); (H.W.H.); Tel.: +82-2-2112-5574 (J.H.B.); +82-31-881-7109 (H.W.H.); Fax: +82-2-2112-5635 (J.H.B.); +82-31-881-7069 (H.W.H.)
| |
Collapse
|